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ABSTRACT 


The  method  of  moments  is  used  to  characterize  the  asymptotic 
behavior  of  the  central  moments  of  the  sample  occupancy  numbers 
from  the  multinomial  distribution  with  equal  cell  probabilities. 

The  limiting  behavior  is  then  used  to  establish  asymptotic  normaility 
when  the  sample  size  n  and  the  number  of  cells  N  tend  to  infinity 
so  that  n/N— a,  0  <  a  <  »  . 


THE  IJMITING  DISTRIBUTION  OF  THE  SAMPLE  OCCUPANY  NUMBERS 
FROM  THE  MULTINOMIAL  DISTRIBUTION  WITH  EQUAL  CELL  PROBABILITIES 

B.  Harris  and  C.  J.  Park 


1.  Introduction.  Assume  that  a  random  sample  of  n  observations  has  been  made 

from  a  multinomial  population  with  uniform  cell  probabilities,  that  is,  cell  i  has 

probability  N  *,  i  =  1,  2,  . . . ,  N  .  Let  s^  be  the  number  of  cells  which  occur 

exactly  i  times  in  the  sample.  Then,  we  clearly  have 

n  n 


0) 

X 

\ 


Y  s  =  N  and  Y  i  s  =  n 
i=0  1  i=0 


\ 


The  random  variables  s.,  i  =  0,  1,  . . . ,  n  will  be  called  the  (sample) 

occupancy  numbers  in  agreement  with  usage  in  past  publications  of  the  authors. 

(Wi^ks  [10]  refers  to  these  as  the  cell  frequency  counts). 

)ur  interest  in  the  behavior  of  the  occupancy  numbers  is  motivated  by  their 

significant  role  in  non-parametric  tests  of  the  hypothesis  F(x)  =  FQ  (x),  where 

F(x)  is  an  absolutely  continuous  cumulative  distribution  function  and  F^(x)  is 

a  specified  absolutely  continuous  cumulative  distribution  function.  In  particular, 
2 

the  x  goodness  of  fit  test^  the  empty  cell  test,  and  the  likelihood  ratio  test 
(based  on  the  multinomial  distribution)  all  are  expressible  in  terms  of  occupancy 
numbers.  Fear  each  of  these  tests,  the  customary  procedure  (but  not  the  only  one 
possible)  is  to  select  an  integer  N  in  advance  of  the  experiment;  then  divide 
the  ical  line  into  N  consecutive  intervals  each  of  which  has  probability  N  1 
under  FQ(x)  .  Thus,  when  the  hypothesis  is  true,  the  distribution  of  the 
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observations,  when  classified  only  by  the  interval  in  which  they  fall  and  ignoring 
the  natural  ordering  of  the  intervals,  is  the  multinomial  distribution  with  equal  cell 
probabilities. 

In  this  paper,  we  will  study  the  limiting  distribution  of  s^  i  =  1,  2,  . . . ,  k  ; 
k  fixed  and  independent  of  n  and  N,  as  n,  N  —  <*>  so  that  n/N -*a, 

0  <  a  <  «>  . 

Under  the  hypotheses  of  this  paper,  I.  Weiss  [  9]  and  M.  Okarnoto  [6]  estab¬ 
lished  independently  that  (s  -  E(s  ))/<r  has  a  limiting  standard  normal 

u  0  s0 

distribution.  Weiss  and  Okarnoto  both  employed  the  method  of  moments  in  their 

investigation.  Subsequently,  Renyi  [7]  reexamined  the  limiting  distribution  of 

sQ  using  generating  functions.  The  limiting  distribution  of  sQ  under  alternative 

hypotheses  was  examined  by  S.  Kitabatake  [5]  and  V,  P.  Chistyakov  [1]* 

Sevast:yanov  and  Chistyakov  [8],  using  saddlepoint  methods,  established 

the  joint  asymptotic  normality  of  any  subset  of  (s  ,  s  ,  . . . ,  s  )  and  this  was 

u  *  p 

extended  to  alternative  hypotheses  by  Chistyakov  and  Viktorova  [  Z] . 

In  this  paper,  we  study  the  asymptotic  distribution  of  by  using  the  method 
of  moments.  Despite  the  fact  that  the  asymptotic  normality  has  been  previously 
established,  it  was  felt  that  information  concerning  the  rate  of  convergence  of 
the  standardized  central  moments  would  prove  useful  and  lead  to  improvements 
in  probability  estimates  over  those  specifically  given  by  the  limiting  normal  dis¬ 
tribution.  In  the  Sevast'yanov  and  Chistyakov  [  8]  and  the  Chistyakov  and 
Viktorova  [  2]  papers  only  the  moments  of  order  one  and  two  are  reported  and 
for  these  only  the  leading  terms  of  their  asymptotic  development  are  reported. 

The  methods  of  this  paper  can  be  extended  to  exhibit  the  joint  asymptotic  normality 
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of  any  subset  of  (sQ,  s  ,  , . . ,  s  ) ,  but  this  extension  would  be  vefy  tedious. 

The  complete  asymptotic  expansion  of  the  standardized  central  moments  of  sQ 
is  implicit  in  Weiss’s  paper  [  9 j ,  but  the  specific  details  are  not  provided  therein. 

In  another  paper  (Harris  and  Park  [  3j),  we  have  studied  the  limiting  distribu¬ 
tion  of  linear  combinations  of  the  occupancy  numbers^  since  this  is  precisely 
the  form  in  which  the  occupancy  numbers  enter  into  various  non-parametric  tests. 
The  results  in  this  paper  have  been  useful  in  pursuing  that  investigation. 

2.  The  Moments  of  the  Occupancy  Numbers.  In  Wilks  [10],  p.  433,  the  joint 


distribution  of  s  ,  s,,  s_,  . ...  s  is  given  by 
O  l  *.  n 


P(sQ,  Sj,  sn)  = 


n!  N! 


s  s  s 

Nn(0!)  (1!)  ...  (n!)  nsQ!  s^  . . . 


where  s  >  0,  5,  s.  =  N,  £  is  =  n  .  The  v  factorial  moment  (Wilks  [10], 
1  i=l  1  i=l 
p.  153  or  433)  is  given  by 


where  v  <  N,  iv  <  n  .  Thus,  we  can  write 


e(s;  ') 


(v).  '  N'  '  ,n  ,iv„  V  ,n-iv  ,  , 

,  >  = - *  O-n)-  h(N,n,i,v)  , 

1  (i!)  1 


where 


h(N,n,i,  v)  = 


iv  =  0 


iv-1 

TT  (l-~)  iv>0 

t=0  n 
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Let  be  the  k**1  central  moment  of  st  and  let  or  and  p  be  the 

K  A  )y  iC  j,  JC 

Stirling  numbers  of  the  first  and  second  kind  respectively,  defined  by 


and 


where  x^  =x(x-l) 


We  adopt  the  conventions  that  a  -  p  =0  unless  j  =  k  =  0,  or  0<j<  k  . 

IC  JjK 

In  particular  ®0  0  =  ^0  0  =  1  ’  Then» 
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(5)  ^  =E(Si-E(s.))k 


V  /  Mr /k\  N  .n.ir.,  1  .r(n-i),  ...  .  ,.,r 

=  u  (_1)  (r ) - 1  fo-)  (1-^)  ;h(N,  n,  i,  1)] 

(i!) 


r=0 


k-r  .T(p) 

X  n  N  ,n  ,ip  „  p  .  n-ip  ,  . 

^A.k-r  —  <N>  1  ~  N  h(N>n-1-p> 


P=0  p-k‘r  (i!)P  N 
k  k-r 


k  k-r  p  ,  „r4j  . ,  ,  . 

=  >  y  )  (-iff)— — (i)i(r+p)«  p  , 

r=0p=0j=0  r  (i!)r+P  N  i>P  p>k'r 


(1"N,r(l>"11  (1-N,n"1P[h(N>n>1>I)irh(N>n<i>p)  • 


We  set  p  4  r  =  s  and  j  4  r  =  I  obtaining 

^=n  lw'^^lsvr,,rft,,k.r 

r=0  s=r  i=r  (i!)  *  * 


(l-^)r(n“l)  (l-~ :)n  l(S_r)[h(N,n,i,l)]rh(N,n,i,s-r) 


In  the  asymptotic  analysis  of  (6),  we  will  frequently  employ  the  following 

relationships.  If  N,  n  —  «  so  that  n/N— or,  o<ar<°°,  then  for  each  fixed  u, 

00 

1  /u  Jl 


(7)  n-^)n  =  exp{-nL 


=  exp{-n£f  (^)j}(l4  0(N'T))  . 

j=lJ  W 


We  will  also  employ  the  convention  that  ^  a  =  0,  whenever  b  <  a  . 

i=a  1 

Now  apply  (7)  to  (6)  and  let  n,N—  «j,  so  that  n/N-*  or,  0  <  a  <  co,  obtaining 
for  each  fixed  i,  1  <  i  <  q  , 
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(1-  ^)r(n_i)  (l-^)n"i(S"r)[h(N,n,i,l)]rh(N,n,i,s-r) 

OO  00 

=  exp{-r(n-i)£  — -  -(n-i(s-r))£ 

t=ltN  t=ll 

i-1  oo  i(s-r)  -1  oo 


u=0  t=l 


u=0  t=l 


r  oo  t+1  oo 

f  ,n  .  n  V'  r+(s-r)  .  ,  v  r+(s-r 

=  exp<-s  (-)  -  n  L  - t  +  1  lj  t 

L  t=l  (t+l)N  t=l  tN 


oo  i-1  t  „  oo  i(s-r)-l  t  -x 

t=l  u=0  tN  t=l  u=0  tN  ) 


rhus, 


(9)  (I-*) 


1  r(n-i)  ..  s-r  n-i(s-r) 


«"■> 


[h(N,n,i,l)]  h(N,n,i,s-r) 


'I 

r'-'Vrb 

u=0  J 


,n  .  r-f  (s-r)141  r+  fs-r)t+1  V  ,N  t  ru_ 

N  TTI  " 1  t  ^0(n>  t 


(1  +  0  (N~  T) ) 


Observe  that  the  exponent  in  ( 9)  is  of  the  form 


where  P  (r)  is  a  polynomial  in  r  of  degree  at  most  t  +  1  with  coefficients 

t+ 1  *T 

depending  on  n/N,  r,  i,  and  s  .  Now  we  expand  exp{£  pt+1<r)/N  )  obtaining 


(10) 
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Lemma  1.  If  N,  n  -*  »  so  that  n/N  ■*  a  >  0  and  i  <  q,  then  for  any  pair  of 
positive  integers  p  and  t  , 


<»)  d-|f)r(n-i,(l-^)n-i(S-r,[h(N,„(i,l)]rh(N,„)i,s-r, 


=  e 


n 

-s— 
N 


t  (  Z  pt+,  <r)  /NV/j ! )  +  0(N'P_1) 

t  r\  x.  1  * 


j=0  t=l 


1  +  0  (n“T) 


where  Pt+^  (r)  is  a  polynomial  of  degree  at  most  t  +  1  in  r 
We  now  establish  the  following. 

Lemma  2.  Under  the  hypotheses  of  lemma  1, 


(13)  t  (EPt+1(r)/NV/i!  =lf  K  (r,s-r,i)/Nm  , 

„•  ^  i+i  „  m 


j=0  t=l 


m=0 


where  for  each  m<p,  K  (r,  s-r,i)  is  a  polynomial  in  r  of  exact  degree  2m 

m 

2m 

whenever  n#  iN  .  The  coefficient  of  r  1  is 


!=!£.(■!(*.,  _1+iL  (*»„"■ 

i!  '  2  lN '  2  'n  " 


m! 


Proof: 


P  t 


Z  <2  pw1m/*V/ji  =  Z  jr  Z  fc  !'  k , 

1=0  t=l  j=0J*  V  2 T*  N  N 


PT+1(r)  k 
N 


Collecting  terms  by  powers  of  N,  we  get 
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k  k  k 

y  _L_  V  [P2(r)3  [P3(r):i  fPT+l(r)J  T 

L  m  h 


m=0  N 


k,!  k  !  . . .  k  ! 

12  T 


the  second  sum  running  over  k„  k_,  . . . ,  k  with  X  jk  =  m  and  A  k.  <  p  . 

12  T  j=l  j  j=l  j~ 

The  degree  k  of  each  K  satisfies 

m 


i 

+  <m+  p<2m  . 


j=i 


Further,  for  each  m  <  p,  set  k  =  m,  k_  =  k„  =  . . .  =  k  =0  obtaining  the  term 
»  —  ri  1  2  3  T 


f  P  (r)] m  which  is  of  degree  2m,  since  the  coefficient  of  r0‘“  is  (-l)in(n/2N  - 
2  m 

i  +  Ni  /2n)  and  is  non-zero  by  hypothesis.  This  is  clearly  the  only  term  of 
degree  2m  . 

The  following  lemma  can  now  be  established. 

Lemma  3.  For  N,  n  sufficiently  large,  n  *  iN,  and  m  <  p  , 

lr 

.r  ,k. 


2m 


.m 


(14) 


E  m>  <> 

r=0 


A 


r  u+m-r,  s-r  s-r,  k-r  m 
0  u  >  k/2 


K  (r,s-r,i) 


where 


ck!  u  =  k/2 


(15) 


c  = 


/s-k/2-m 

[2(s-k/2-m)]l 


D,  t  « jn  .  ,  2 

- —  izil — /i.  /it  uLI)111 

[2(k-s)]!  m!  '  2  'n'  2  n  ;  ’ 


C„  =  (-1)V  D  =  (-1)V  TT  (2j— 1) 


j=l 
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Proof:  It  is  well  known  (see  Jordan  [4],  p.  151  and  p.  171)  that  a_  <§  _  and 

— — —  rl~V»  M 

6  are  polynomials  in  n  of  degree  2v  with  leading  coefficients  C  / ( 2v) ! 

and  Dy/ (2v)!  respectively.  Then, 

f  (r)  =  a  ,  P  ,  K  (r,  s-r,i) 

u+u-r,  s-r  s-r,  k-r  m  * 

is  a' polynomial' in  r  of  degree  2(k-u)  .  Thus,  the  left  hand  side  of  (14)  is 


k  .  .  .  C  0  u  >  k  /  2 

L  (-Dr  C)  f  (r)  =  (-1)  A  f  (0)  =  ( 

r=0  |  ck!  u  =  k/2  . 


We  now  return  to  our  examination  of  the  central  moments  and  obtain  the  following 
theorem. 

Theorem  I.  If  N,n  —  °o  so  that  n/N-~  a  >  0,  then  for  every  fixed  i  <  q  and 
each  fixed  k  , 


(16) 


<*»  =  Nk/*  Dk/2(7re'“»k/2f'- (2t^£l-)fr  e'“jk/2+  °<Nk/2“>  . 


k  even; 

(17) 


lim  — 
n,  N—°°  [  p. 


Pk(i) 

2<«]kA 


fk/2 
IT  (2v-l) 


0 


k  even 

k  odd  . 


Proof:  From  (6),  (12),  and  (13),  we  have 


k  k  s 


r=0  s=r  l=r  (i!)  *  * 


n 
-s — 
N 


Tp  Km( r,s-r,i) 


m=0  N 


m 


+  OfN'^1) 


(1+0(N~T )j  . 
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Letting  u  =  i  -  m  and  interchanging  the  order  of  summation,  we  have, 


k  k  minj[s-u,Tp)  u+m  ,  „u 

4°  =  2;  l  l  £  <-i>r<*)-^cf>ls 

u=-Tp  s=u  m=max(0,  -u)  r=0  (i!) 

n 

'  Wr,s-r  Ps-r,k-r  6  ^ "  (*„, (r.  s-r, 0  +  0(Nm-p-1))  (l+OIN^,). 


Since  a  =  0  for  p  <  0,  we  can  extend  the  upper  limit  of  the  sum  on  r  to  k  , 
P>  ^ 


obtaining 


(20)  -  Z  NU  l 


k  min(s-u,rp) 


(i)  -  V  V  . V  " 


-sn/N 


L  *  h  L  ^  — 

u=-rp  s=u  m=max(0,-u)  (i!) 


K  K 

.T  (-l)rr)a  ,  P  .  K  (r,s-r,i)+T  0(NU_T) 

u'  r  u+m-r,s-rrs-r,k-r  m  *  *  u  ’ 

r=0  ’  u=-Tp 


k  min(TP,k-u)  ,  . 

+  l  E  0(NU+m-p) 

u=-xp  m=max(0,-u) 


a  = 


min(s-u,rp)  k 

v  7  r  v1  .  ..r 


V  y(-l)  (  )«_,  P  .  K  (r, s-r, i) 

u  ln  .  '  'r'  u+m-r,  s-r  s-r,k-r  m  ’  7 

=max(0,  -u)  r=0  ’  ’ 


then,  since  a  (i,k)  =0  for  s<0 


(21)  ^'=1  NU  |(^)iS£-^as  u(l,k)+0(Nk‘T)  +  OCN*'1’"1)  . 

u=~ TP  s=max(0,u)  ’ 

If  n  #  iN,  we  can  apply  lemma  3.  Here  we  choose  t  and  p  larger  than  k  so  that 

the  upper  limit  of  summation  on  m  Is  s-u  for  u>0.  Ti-en,  a  (i,k)  =0 

S  )U 

u>k/2  .  Thus,  the  upper  summation  limit  of  u  becomes  [k/2]  .  Hence,  for 
k  even 


fi  (22) 


k-3 


^‘>=Nk/2  |  — “V  (^)‘S  a.  +  R(k,N,n,l)  +  0(Nk'>L) 


s=k/2  (i!) 


:,k/2( 


n  where  X  =  min(T>  p  -f  1)  and 


ri 


[k/23-1  k 

R(k,  N,n,i)  =  £  NU  £  e~Sn/N 


/n  /,  1  * 

(*t  )  (*»k)  * 


u=-rP  s=u  (i|)s  N  s,u 


For  k  odd,  we  have, 


(23) 


k-1 


-sn/N 


Z  * - ~T(¥>iSas  /k  M/2(i,k)  +R(k,N,n,i) 

=  (k-l)/2  (i!)  N  Mk-0/2 


+  0(Nk"X) 


From  the  proof  of  lemma  3,  since  a  (i#k)  is  a  sum  of  polynomials  of  degree  <  2  (k-u  \ 

S  j  U  “ 

as  U(i» k)  is  itself  a  P01^0"11®1  of  degree  <  2 (k-u)  .  Further  for  n  and  N 
both  sufficiently  large,  a  (i,k)  is  uniformly  bounded  in  u,  -Tp<u£k  . 

Hence  R(k,N,n,i)  =  0(N(k/z]-l)  .  Choose  X  >  2k  +  1  .  Then,  for  k  even 

s-k/2  s-k/2-m 

(2 4)  lim  as  k/2^k>  =k?  Z  -s-k/2  m  - n -  ' 

N, n-*-  °o  S’K/  m=o  2  k/Z-m*  •  •-Jc_s 


(s-k/2-m)!  2  (k-s)! 


/  1  \  ^  •  & 


2a 


-  k,  sy/2  (-1  )5-^VW/bi 

1  k/2  ' 


m 


m=0  2  '  (s-k/2-m)  !  (k-s)  fm! 
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s-k/2 


kj(-l) _ 

k/2 

2  '  (k-s)! (s-k/2) 


s-k/2 


-  z 

!  m=0  v  7 


\/l%)  ,-i>s-k/f^-f'k/2 


Thus  for  k  even, 


(i) 


(25)  lim  v)2 
n,  N— N 


•  v.  £  „  5 


,  As-k/2 


For  k  odd,  the  conclusion  for  N*  in  follows  from  (23)  and  R(k,N,  n,i)  = 
OfN^2^)  . 

For  N  =  in,  the  conclusion  follows  from  the  continuity  of  \i^/N^2 

in  a  .  To  see  this,  observe  that  (21)  is  a  finite  sum  and  that  for  N  sufficiently 

large,  n  =  <*N  +  o(N)  .  Substitution  of  this  into  (21)  and  application  of  some 

elementary  analysis  permits  one  to  verify  the  continuity  of  the  limit  (25)  in  a  . 

Corollary,  Under  the  hypotheses  of  theorem  1,  (s  -  E(s  ))/<r  has  a  limiting 

1  1  Si 

standard  normal  distribution. 

Proof:  This  is  immediate  upon  noting  that  lim  are  the 

n,N-«  * 

moments  of  the  standard  normal  distribution. 

Remark.  The  methods  of  this  section  are  a  direct  extension  of  those  used  by 
I.  Weiss  [9].  We  have  however  extended  the  analysis  to  s^,  i#  0,  whereas 
Weiss  restricted  his  attention  to  sQ  .  The  procedure  used  herein  also  gives  a 

complete  asymptotic  expansion  for  and  thus  contains  additional  infoimation 

on  the  limiting  behavior  beyond  the  statement  of  the  corollary. 
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